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1 Introduction 

The goal of this paper is to present an overview of the software collection 
for the solution of linear and nonlinear semidefinite optimization problems 
Pennon. In the first part we present theoretical and practical details of the 
underlying algorithm and several implementation issues. In the second part 
we introduce the particular codes Pensdp, Penbmi and Pennon, focus on 
some specific features of these codes and show how they can be used for the 
solution of selected problems. 

We use standard notation: is the space of real symmetric matrices of 

dimension m x m and S™ the space of positive semidefinite matrices from 
The inner product on is defined by {A,B)gm := trace(Ai3). Notation 
A ^ B for A,Bg S’" means that the matrix B — A is positive semidefinite. 
The norm || • || is always the £2 norm in case of vectors and the spectral norm 
in case of matrices, unless stated otherwise. Finally, for ^ : S’" —>■ S’" and 
X,Y G S’", D(!>{X)[Y] denotes the directional derivative of <£> with respect to 
X in direction Y. 


2 The main algorithm 

2.1 Problem formulation 

The nonlinear semidefinite problems can be written in several different ways. 
In this section, for the sake of simplicity, we will use the following formulation: 
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min fix) (1) 

subject to 

A{x) =4 0. 

Here / : K" —>■ K and A : M” —>■ S’” are twice continuously differentiable 
mappings. 

Later in sections on linear SDP, BMI and nonlinear SDP, we will give 
more specific formulations of the problem. However, the algorithm and theory 
described in this section applies, with some exceptions discussed later, to all 
these specific formulations. 

2.2 The algorithm 

The basic algorithm used in this article is based on the nonlinear rescaling 
method of R. Polyak [30] and was described in detail in [16] and [31]. Here we 
briefly recall it and stress points that will be needed in the rest of the paper. 

The algorithm is based on the choice of a smooth penalty/barrier func¬ 
tion : S’" —>■ S™ that satisfies a number of assumptions (see [16, 31]) 
guaranteeing, in particular, that for any p > 0 

A{x) =4 0 <Pp{A{x)) =4 0 

for (at least) all x such that A{x) =4 0. Thus for any p > 0, problem (1) has 
the same solution as the following “augmented” problem 

min fix) (2) 

subject to 

<Pp{A{x)) =4 0. 

The Lagrangian of (2) can be viewed as a (generalized) augmented La- 
grangian of (1): 


F{x,U,p) = f{x) + {U,<Pp (^(a;)))s™ ; (3) 

here U € S™ is a Lagrangian multiplier associated with the inequality con¬ 
straint. 

The algorithm below can be seen as a generalization of the Augmented 
Lagrangian method. 

Algorithm 1. Let x^ and be given. Let p^ > 0. For fc = 1, 2,... repeat 
until a stopping criterion is reached: 

1. = argmin Fix, ,p^) 

2 . [/'=+! = D%{A{x'^+^))[U'^] 

3. /+! < / . 
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Details of the algorithm, the choice of the penalty function the choice 
of initial values of cc, U and p, the approximate minimization in Step (i) and 
the update formulas, will be discussed in subsequent sections. The next section 
concerns the overview of theoretical properties of the algorithm. 

2.3 Convergence theory overview 

Throughout this section we make the following assumptions on problem (1): 

{Al) X* = argmin{/(a;)|x G 17} exists, where 17 = {x G ]R”|^(x) ^ 0}. 

{A2) The Karush-Kuhn-Tucker necessary optimality conditions hold in x*, i.e., 
there exists U* G S™ such that 

nx*) + [{U*,A.)]Z, = 0 
{U*,A{x*)) = 0 
U*h0 

A{x*) ^ 0, (4) 

where Ai denotes the i-th partial derivative of .4 at x* {i = l,...,n). 
Moreover the strict complementary is satisfied. 

(4.3) The nondegeneracy condition holds, i.e., if for 1 < r < m the vectors 
Sm-r+i, ■ ■ ■ ,Sm & form a basis of the null space of the matrix 4(x*), 
then the following set of n-dimensional vectors is linearly independent: 

Vij = {sj AiSj,... ,sj AnSjZ, m — r + l<i< j < m. 

(44) Define Eq = (sm-r+i, ■ ■ ■, Sm), where Sm-r+i, ■ ■ ■ ,Sm are the vectors in¬ 
troduced in assumption (43). Then the cone of critical directions at x* is 
defined as 

C{x*) = |h G M” : ^ h.E^A^Eo < 0,/'(x*)^h = 0 

With this the following second order sufficient optimality condition is 
assumed to hold at {x*,U*)-. For all h G C{x*) with h ^ 0 the inequality 

{L'Ux*,U*) + H{x*,U*))h>0, 

is satished, where L is the classic Lagrangian of (1) defined as 

L{x, U) = /(x) -h {U, A{x )), 

H{x*, U*) is defined entry-wise by 

H{x\U*)i,j = -2{U*,A[A{x*)]^Aj) (5) 

(see, for example, [4, p. 490]) and [4(x*)]'l' is the Moore-Penrose inverse 
of 4(x*). 
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(A5) Let 


f2p = {x G IR”|A(x) ^ pim} ■ 

Then the following growth condition holds: 

d-TT > 0 and T > 0 such that max {||A(a;)|| | x € < t. 


(6) 


Using these assumptions the following local convergence result can be es¬ 
tablished. 


Theorem 1. Let A{x) he twice continuously differentiable and assumptions 
(Al) to (A5) hold for the pair {x*, U*). Then there exists a penalty parameter 
po > 0 large enough and a neighbourhood V of {x*,U*) such that for all 
iU,p)GV: 

a) There exists a vector 

X = x(U,p) = argmin{F(a;, U,p)\x G R"} 
such that Va;F(x, U,p) =0. 

b) For the pair x and U = U{U,p) = DFp {A{x{U,p))) [U] the estimate 

max{p - x*||, \\U - C/*||} < Cp\\U - U*\\ (7) 

holds, where C is a constant independent of p. 

c) x{U*,p)=x* and U{U*,p) = U*. 

d) The function F(x, U,p) is strongly convex with respect to x in a neighbor¬ 
hood ofx(U,p). 

The proof for Theorem 1 as well as a precise definition of the neighborhood 
V is given in [31]. A slightly modified version of Theorem 1 with a particular 
choice of the penalty function (p can be found in [21]. An alternative conver¬ 
gence theorem using slightly different assumptions is presented in [29]. 

An immediate consequence of Theorem 1 is that Algorithm 1 converges 
with a linear rate of convergence. If —>■ 0 for fc —>■ oo is assumed for the 
sequence of penalty parameters then the rate of convergence is superlinear. 

Remark 1. a) Let x^ be a local minimum of problem (1) satisfying assump¬ 
tions (A2) to (A5) and denote by [/+ the corresponding (unique) optimal 
multiplier. Assume further that there exists a neighborhood S,, of x^ such 
that there is no further first order critical point x x'^ in Su{x'^). Then all 
statements of Theorem 1 remain valid, if we replace (x*, U*) by (x^, U^) and 
the function x{U,p) by 

xioc{U,p) = argmin{F(x, 17,p)|x G ]R”,x G A^}. (8) 

Moreover Theorem 1 d) guarantees that F{x,U,p) is strongly convex in a 
neighborhood of x+ for all {U,p) G V. Consequently any local descent method 
applied to the problem 
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{i') Find such that II [/'=,/) II = 0 (9) 

will automatically find a solution, which satisfies the additional constraint 
^fc+i g provided it is started with close enough to Moreover, Algo¬ 
rithm 1 will converge to the local optimum x'^ (see [31] for more details). 

b) A global convergence result can be found in [31]. 


2.4 Choice of 

The penalty function of our choice is defined as follows: 

<Pp{A{x)) = -/(M(x) - pl)~^ - pi . (10) 

The advantage of this choice is that it gives closed formulas for the first and 
second derivatives of <Ip. Defining 


Z{x) = —{A{x) — pi) 


-1 


( 11 ) 


we have (see [16]): 
d 


^^^p{Aix)) =p^Z{x)^^^^Z{x) 
a, (A( \\ 2^/ 

<I>p{A{x)) =p Z{x) ( Z{x)- 


dxidx 


dxi 


dxi 


dxidxj 


dA{x) . 9A(a;)' 


dxj 


( 12 ) 


(13) 


2.5 The modified Newton method 

To solve the (possibly nonconvex) unconstrained minimization problem in 
Step 1, we use the following modification of the Newton method with line- 
search: 

Algorithm 2. Given an initial iterate a;o, repeat for all fc = 0,1, 2,... until a 
stopping criterion is reached 

1. Compute the gradient gk and Hessian of F a.t Xk- 

2. Try to factorize Hk by Cholesky decomposition. If Hk is factorizable, set 
H = Hk and go to Step 4. 

3. Compute /3 G [—Amin) —2Amin] > where Amin is the minimal eigenvalue of Hk 

and set H = Hk + (H- ^ 

4. Compute the search direction dk = —H~^gk. 

5. Perform line-search in direction dk- Denote the step-length by Sk- 

6. Set Xk+i =Xk + Skdk- 



6 


Michal Kocvara and Michael Stingl 


The step-length s in direction d is calculated by a gradient free line-search 
algorithm that tries to satisfy the Armijo condition. Obviously, for a convex F, 
Algorithm 2 is just the damped Newton method, which is known to converge 
under standard assumptions. 

If, in the non-convex case, the Cholesky factorization in Step 2 fails, we 
calculate the value of P in Step 3 in the following way: 

Algorithm 3. For a given /3o > 0 

1. Set P = Pq. 

2. Try to factorize H + pi hy the Cholesky method. 

3. If the factorization fails due to a negative pivot element, go to step 4, 
otherwise go to step 5. 

4. If /3 > Po, set P = 2P and continue with 2. Otherwise go to step 6. 

5. If /3 < /3o, set /3 = ^ and continue with step 2. Otherwise STOP. 

6. Set P = 2P and STOP. 

Obviously, when Algorithm 3 terminates we have p € [—Amin,—2Amin]- It is 
well known from the nonlinear programming literature that under quite mild 
assumptions any cluster point of the sequence generated by Algorithm 2 is a 
hrst order critical point of problem in Step I of Algorithm 1. 

Remark 2. There is one exception, when we use a different strategy for the cal¬ 
culation of p. The exception is motivated by the observation that the quality 
of the search direction gets poor, if we choose P too close to —Amin- Therefore, 
if we encounter bad quality of the search direction, we use a bisection tech¬ 
nique to calculate an approximation of Amin, denoted by and replace P 

by 

Remark 3. Whenever we will speak about the Newton method or Newton 
system, later in the paper, we will always have in mind the modified method 
described above. 

2.6 How to solve the linear systems? 

In both algorithms proposed in the preceding sections one has to solve repeat¬ 
edly linear systems of the form 

{H + D)d = -g, (14) 

where H is a diagonal matrix chosen such that the matrix F[ + D is positive 
definite. There are two categories of methods, which can be used to solve 
problems of type (14): direct and iterative methods. Let us first concentrate 
on the direct methods. 
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Cholesky method 

Since the system matrix in (14) is always positive definite, our method of 
choice is the Cholesky method. Depending on the sparsity structure of iJ, we 
use two different realizations: 

• If the fill-in of the Hessian is below 20% , we use a sparse Cholesky solver 
which is based on ideas of Ng and Peyton [27]. The solver makes use of 
the fact that the sparsity structure is the same in each Newton step in all 
iterations. Hence the sparsity pattern of H, reordering of rows and columns 
to reduce the fill-in in the Cholesky factor, and symbolic factorization of 
H are all performed just once at the beginning of Algorithm 1. Then, 
each time the system (14) has to be solved, the numeric factorization is 
calculated based on the precalculated symbolic factorization. Note that we 
added stabilization techniques described in [34] to make the solver more 
robust for almost singular system matrices. 

• Otherwise, if the Hessian is dense, we use the ATLAS implementation of 
the LAPACK Cholesky solver DPOTRF. 

Iterative methods 

We solve the system Hd = —g with a symmetric positive definite and, possibly, 
ill-conditioned matrix H = H + D. We use the very standard preconditioned 
conjugate gradient method. The algorithm is well known and we will not 
repeat it here. The algorithm is stopped when the normalized residuum is 
sufficiently small: 

11 -^ 4 + 511/11511 < e. 

In our tests, the choice e = 5 • 10“^ was sufficient. 

Preconditioners 

We are looking for a preconditioner—a matrix M G S"—such that the system 
M~^Hd = —M~^g can be solved more efficiently than the original system 
Hd = —g. Apart from standard requirements that the preconditioner should 
be efficient and inexpensive, we also require that it should only use Hessian- 
vector products. This is particularly important in the case when we want to 
use the Hessian-free version of the algorithm. 

Diagonal preconditioner 

This is a simple and often-used preconditioner with 

M = diag(i7). 

On the other hand, being simple and general, it is not considered to be very 
efficient. Furthermore, we need to know the diagonal elements of the Hessian. 
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It is certainly possible to compute these elements by Hessian-vector products. 
For that, however, we would need n gradient evaluations and the approach 
would become too costly. 

L-BFGS preconditioner 

Introduced by Morales-Nocedal [25], this preconditioner is intended for ap¬ 
plication within the Newton method. (In a slightly different context, the 
(L-)BFGS preconditioner was also proposed in [10].) The algorithm is based 
on the limited-memory BFGS formula ([28]) applied to successive CG (instead 
of Newton) iterations. The preconditioner, as used in Pennon is described in 
detail in [18]. Here we only point out some important features. 

As recommended in the standard L-BFGS method, we used 16-32 cor¬ 
rection pairs, if they were available. Often the GG method finished in less 
iterations and in that case we could only use the available iterations for the 
correction pairs. If the number of CG iterations is higher than the required 
number of correction pairs /i, we may ask how to select these pairs. We have 
two options: Either we take the last p, pairs or an “equidistant” distribution 
over all CG iterations. The second option is slightly more complicated but we 
may expect it to deliver better results. 

The L-BFGS preconditioner has the big advantage that it only needs 
Hessian-vector products and can thus be used in the Hessian-free approaches. 
On the other hand, it is more complex than the above preconditioners; also 
our results are not conclusive concerning the efficiency of this approach. For 
many problems it worked satisfactorily, for some, on the other hand, it even 
lead to higher number of CG steps than without preconditioner. 

2.7 Multiplier and penalty update 

For the penalty function <l>p from (10), the formula for update of the matrix 
multiplier U in Step (ii) of Algorithm 1 reduces to 

Uk+i = (/)2z(x'=+i)C/'=Z(x'=+^) (15) 

with Z defined as in (11). Note that when is positive definite, so is . 
We set equal to a positive multiple of the identity. 

Numerical tests indicate that big changes in the multipliers should be 
avoided for the following reasons. Big change of U means big change of the 
augmented Lagrangian that may lead to a large number of Newton steps in 
the subsequent iteration. It may also happen that already after few initial 
steps the multipliers become ill-conditioned and the algorithm suffers from 
numerical difficulties. To overcome these, we do the following: 

1. Calculate using (15). 

2. Choose a positive /m < 1, typically 0.5. 

3. Compute Aa = niin 
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4. Update the current multiplier by 


U 


new 




Given an initial iterate , the initial penalty parameter is chosen large 
enough to satisfy the inequality 


p^I-A{x^) >■ 0 . 

Let Amax(Al(a;^''"^)) € (—oo,p^) denote the maximal eigenvalue of A{x^^^), 
TT < 1 be a constant factor, depending on the initial penalty parameter 
(typically chosen between 0.3 and 0.6) and Xfeas be a feasible point. Let I be 
set to 0 at the beginning of Algorithm 1. Using these quantities, our strategy 
for the penalty parameter update can be described as follows: 

1. If < Pepsi set 7 = 1 and go to 6 . 

2. Calculate Amax(Al(a;^+^)). 

3. If 7 rp^ > Amax(Al(a;^+^)), set 7 = tt, / = 0 and go to 6 . 

4. If / < 3, set 7 = (Amax(Al(a;*+^)) +p^) /(2p*), set U= I + 1 and go to 6 . 

5. Let 7 = TT, find A G (0,1) such, that 

Amax (Al(Aa;''+^ + (1 - A)xfeas)) < TTp'", 

set = Ax^+^ + (1 ~ A)a;feas and I := 0 . 

6 . Update current penalty parameter by 


The reasoning behind steps 3 to 5 is as follows: As long as the inequality 


Amax(Al(a;'"+^)) < Trp'^ 


(16) 


holds, the values of the augmented Lagrangian in the next iteration remain 
finite and we can reduce the penalty parameter by the predefined factor tt. As 
soon as inequality (16) is violated, an update using tt would result in an infinite 
value of the augmented Lagrangian in the next iteration. Therefore the new 
penalty parameter should be chosen from the interval (Amax(Al(a;^+^)),p^). 
Because a choice close to the left boundary of the interval leads to large values 
of the augmented Lagrangian, while a choice close to the right boundary slows 
down the algorithm, we choose 7 such that 

fc+i ^ Ainax(Al(a;'^+^)) +p^ 

^ 2 

In order to avoid stagnation of the penalty parameter update process due to 
repeated evaluations of step 4, we redefine using the feasible point Xfeas 
whenever step 4 is executed in three successive iterations; this is controlled by 
the parameter 1. If no feasible point is yet available. Algorithm I is stopped 
and restarted from the scratch with a different choice of initial multipliers. 
The parameter peps is typically chosen as 10“®. In case we detect problems 
with convergence of Algorithm I, peps is decreased and the penalty parameter 
is updated again, until the new lower bound is reached. 
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2.8 Initialization and stopping criteria 

Initialization 


Algorithm 1 can start with an arbitrary primal variable x G M". Therefore 
we simply choose = 0. For the description of the multiplier initialization 
strategy we rewrite problem (SDP) in the following form: 

min fix) (17) 

subject to 

Ai{x) =4 0, i = 1,... ,a. 


Here Ai (x) G are diagonal blocks of the original constrained matrix A{x) 

and we have ct = 1 if A{x) consists of only one block. Now the initial values 
of the multipliers are set to 

Uj = Hjljrij ) J = 1) • ■ • ) O') 


where are identity matrices of order mj and 


Mi = 


nij max 

■' l<e<n 


1 + 

9f(oc) 

dxn 

1 + 

dA(x) 

dxg_ 


(18) 


Given the initial iterate x^, the initial penalty parameter is chosen large 
enough to satisfy the inequality 


p^I-Aix^) ^ 0 . 


Stopping criterion in the sub-problem 


In the first iterations of Algorithm 1, the approximate minimization of F 
is stopped when \\-^F{x,U,p)\\ < a, where a = 0.01 is a good choice in 
most cases. In the remaining iterations, after a certain precision is reached, 
a is reduced in each outer iteration by a constant factor, until a certain a 
(typically 10“”^) is reached. 


Stopping eriterion for the main algorithm 


We have implemented two different stopping criteria for the main algorithm. 

• First alternative: The main algorithm is stopped if both of the following 
inequalities hold: 

\f{x^)-F{x\U\p)\ |/(a:fe)-/(a:fe-^)| 

l + |/(a;'=)| l + |/(a:'=)| ’ 

where Si is typically 10“^. 

• Second alternative: The second stopping criterion is based on the KKT- 
conditions. Here the algorithm is stopped, if 

niin{Amax (Al(x)) , |(Al(x), U)\ , \\Va:F{x,U,p)\\} < £ 2 - 
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2.9 Complexity 

The computational complexity of Algorithm 1 is clearly dominated by Step 1. 
In each step of the Newton method, there are two critical issues: assembling 
of the Hessian of the augmented Lagrangian and solution of the linear system 
of equations (the Newton system). 

Hessian assembling 

Full matrices 

Assume first that all the data matrices are full. The assembling of the Hessian 
(13) can be divided into the following steps: 

• Calculation of Z{x) —>■ 0{m^ + rrv^n). 

• Calculation of Z(a;)HZ(x)—>■ 0{m^). 

• Calculation of Z(a;)HZ(a;)A'(a;)Z(a;) for alH —^ Olrri^n). 

• Assembling the rest —>■ 0{m?n?). 

Now it is straightforward to see that an estimate of the complexity of assem¬ 
bling of (13) is given by 0{m^n + m^n^). 

Many optimization problems, however, have very sparse data structure and 
therefore have to be treated by sparse linear algebra routines. We distinguish 
three basic types of sparsity. 

The block diagonal case 

The first case under consideration is the block diagonal case. In particular, 
we want to describe the case, where 

• the matrix A{x) consists of many (small) blocks. 

In this situation the original SDP problem (1) can be written in the form (17). 
If we define m = max{mi \ i = 1,... ,d} can estimate the computational 
complexity of the Hessian assembling by 0(dfh^n -\- dm^n^). An interesting 
subcase of problem (17) is when 

• each of the matrix constraints Ai{x) involves just a few components of x. 

If we denote the maximal number of components of x on which each of the 
blocks Ai(x), i = 1, 2 ,..., d depends by n, our complexity formula becomes 
0{dm^n -|- dfh?n^). If we further assume that the numbers h and m are of 
order 0(1), then the complexity estimate can be further simplified to 0{d). 
A typical example for this sparsity class are the ‘mater’ problems discussed 
in section 3. 
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The case when A{x) is dense and A'^(x) are sparse 

Let us first mention that for any index pair (i, j) € {1,...,n} x {1,... ,n} the 
non-zero structure of the matrix A”j (x) is given by (a subset of the) intersec¬ 
tion of the non-zero index sets of the matrices A'^ (x) and Aj (x). We assume 
that 

• there are at most 0(1) non-zero entries in A^(x) for all i = 1,..., n. 

Then the calculation of the term 


[{Zix)UZix),A'l^ix))]l.^^ 

can be performed in O(n^) time. In the paper by Fujisawa, Kojima and Nakata 
on exploiting sparsity in semidefinite programming [8] several ways are pre¬ 
sented how to calculate a matrix of the form 

D 1 S 1 D 2 S 2 (19) 

efficiently, if Di and D 2 are dense and and S 2 are sparse matrices. If our 
assumption above holds, the calculation of the matrix 

[{Z{x)UZ{x)A'j (x)Z(x ), A', (x))] 

can be performed in 0{n^) time. Thus, recalling that for the calculation of 
Z (x) we have to compute the inverse of an (TOXTO)-matrix, we get the following 
complexity estimate for the Hessian assembling: 0{m^ +n^). Note that in our 
implementation we follow the ideas presented in [8]. Many linear SDP prob¬ 
lems coming from real world applications have exactly the sparsity structure 
discussed in this paragraph. 

The case when A(x) and the Cholesky factor of A{x) is sparse 

Also in this case we can conclude that all partial derivatives of A{x) of first 
and second order are sparse matrices. Therefore it suffices to assume that 

• the matrix A(x) has at most 0(1) non-zero entries. 

We have to compute expressions of type 

{A{x) — pI)~^U{A{x) — pl)~^ and {A{x)—pl)~^. 

Note that each of the matrices above can be calculated by maximally two 
operations of the type {A — where M is a symmetric matrix. Now 

assume that not only A{x) but also its Cholesky factor is sparse. Then, obvi¬ 
ously, the Cholesky factor of {A{x) — pi), denoted by L, will also be sparse. 
This leads to the following assumption: 

• Each column of L has at most 0(1) non-zero entries. 
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Now the i-th column of C := {A{x) — pi) can then be computed as 

and the complexity of computing C by Cholesky factorization is 0{nA), com¬ 
pared to 0(m^) when computing the inverse of iA{x) — pi) and its multi¬ 
plication by U. So the overall complexity of Hessian assembling is of order 
0(to^ -I- n^). 

Remark 4- Recall that in certain cases, we do not need to assemble the Hessian 
matrix. In this case the complexity estimates can be improved significantly; 
see the next section. 


Solution of the Newton system 

As mentioned above, the Newton system 

Hd = -g (20) 

can either be solved by a direct (Cholesky) solver or by an iterative method. 
Cholesky method 

The complexity of Cholesky algorithm is 0{rA) for dense matrices and 0{n'^), 
1 < K < 3 for sparse matrices, where k depends on the sparsity structure of 
the matrix, going from a diagonal to a full matrix. 

Iterative algorithms 

From the complexity viewpoint, the only demanding step in the CG method 
is a matrix-vector product with a matrix of dimension n. For a dense matrix 
and vector, it needs O(n^) operations. Theoretically, in exact arithmetics, the 
CG method needs n iterations to find an exact solution of the system, hence 
it is equally expensive as the Cholesky algorithm. There are, however, two 
points that may favor the CG method. 

First, it is well known that the convergence behavior of the GG method can 
be significantly improved by preconditioning. The choice of the preconditioner 
M will be the subject of the next section. 

The second—and very important—point is that we actually do not need an 
exact solution of the Newton system. On the contrary, a rough approximation 
of it will do (see [11, Thm. 10.2]). Hence, in practice, we may need just a 
few CG iterations to reach the required accuracy. This is in contrast with 
the Gholesky method where we cannot control the accuracy of the solution 
and always have to compute the exact one (within the machine precision). 
Note that we always start the GG method with initial approximation do = 0; 
thus, performing just one GG step, we would obtain the steepest descend 
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method. Doing more steps, we improve the search direction toward the Newton 
direction; note the similarity to the Toint-Steihaug method [28]. 

Summarizing these two points: when using the CG algorithm, we may 
expect to need just O(n^) operations, at least for well-conditioned (or well- 
preconditioned) systems. 

Note that we are still talking about dense problems. The use of the CG 
method is a bit nonstandard in this context—usually it is preferable for large 
sparse problems. However, due to the fact that we just need a very rough 
approximation of the solution, we may favor it to the Cholesky method also 
for medium-sized dense problems. 

Approximate Hessian formula 

When solving the Newton system by the CG method, the Hessian is only 
needed in a matrix-vector product of the type Hv := V‘^F{x^)v. Because we 
only need to compute the products, we may use a finite difference formula for 
the approximation of this product 

h 

with h = {1 + ||a:^|| 2 -\/e); see [28]. In general, e is chosen so that the formula 
is as accurate as possible and still not influenced by round-off errors. The 
“best” choice is obviously case dependent; in our implementation, we use 
e = 10“®. Hence the complexity of the CG method amounts to the number of 
CG iterations times the complexity of gradient evaluation, which is of order 
0(m^ + Kn), where K denotes the maximal number of nonzero entries in 
A[(x), i = 1,2,... ,n. This may be in sharp contrast with the Gholesky method 
approach when we have to compute the full Hessian and solve the system by 
Gholesky method. Again, we have the advantage that we do not have to store 
the Hessian in the memory. 

This approach is clearly not always applicable. With certain SDP problems 
it may happen that the Hessian computation is not much more expensive 
than the gradient evaluation. In this case the Hessian-free approach may be 
rather time-consuming. Indeed, when the problem is ill-conditioned and we 
need many GG iterations, we have to evaluate the gradient many (thousand) 
times. On the other hand, when using Cholesky method, we compute the 
Hessian just once. 


3 PENSDP 

When both functions in (1) are linear, the problem simplifies to a standard 
(primal or dual, as you like) linear semidefinite programming problem (LSDP) 
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min (22) 

subject to 

n 

'y ^ ^k-^k ^0 ^ 0 . 

fe=l 

Here we write explicitly all matrix inequality constraints, as well as linear 
constraints, in order to introduce necessary notation. In the next sections, we 
will present some special features of the code Pensdp designed to solve (22), 
as well as selected numerical examples demonstrating it capabilities. 


3.1 The code PENSDP 


Special features 


Stopping criteria 


In the case of linear semidefinite programs, we have additionally adopted the 
DIMACS criteria [24]. To define these criteria, we denote A{x) = X]fc=i ^kAk- 
Recall that U is the corresponding Lagrangian multiplier and let A*(-) denote 
the adjoint operator to A(-). The DIMACS error measures are defined as 


erri = 


\\ A*{U)-f\\ 
1 + II./II 


err. = max|0,^^^| 

{Ao,U)-f^x 
l + \{Ao,U)\ + \rx\ 


err 4 = max < 0 


~Aniin("4(a^) ~ ^o) 


errs = 


1 + Poll 

{Aix)-Ao,U) 

1 + 1(^0) U)\ + l/^xj 


Here, erri represents the (scaled) norm of the gradient of the Lagrangian, err 2 
and err 4 is the dual and primal infeasibility, respectively, and errs and errs 
measure the duality gap and the complementarity slackness. Note that, in our 
code, err 2 = 0 by definition; also errs that involves the slack variable (not used 
in our problem formulation) is automatically zero. If the “DIMACS stopping 
criterion” is activated we require that 


errfe < 

^DIMACS? fcG{I,4,5,6}. 


Implicit Hessian formula 


As mentioned before, when solving the Newton system by the CG method, 
the Hessian is only needed in a matrix-vector product of the type Hv := 
Sj‘^F{x^)v. Instead of computing the Hessian matrix explicitly and then multi¬ 
plying it by a vector v, we can use the following formula for the Hessian-vector 
multiplication 
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= 2A* {ip’^fZix'^)U’^Zix'^)Aiv)Zix'^)) , (23) 

where we assume that A is linear of the form A{x) = 
denotes its adjoint. Hence, in each CG step, we only have to evaluate matrices 
A{v) (which is simple), Z{x^) and Z{x^)U^Z{x^) (which are needed in the 
gradient computation, anyway), and perform two additional matrix-matrix 
products. The resulting complexity formula for one Hessian-vector product is 
thus 0(vA -\- Kn), where again K denotes the maximal number of nonzero 
entries in ^'(x), i = 1, 2 ,..., n. 

The additional (perhaps the main) advantage of this approach is the fact 
that we do not have to store the Hessian in the memory, thus the memory 
requirements (often the real bottleneck of SDP codes) are drastically reduced. 

Dense versus sparse 

For the efficiency of Pensdp, it is important to know if the problem has 
a sparse or dense Hessian. The program can check this automatically. The 
check, however, may take some time and memory, so if the user knows that 
the Hessian is dense (and this is the case of most problems), this check can be 
avoided. This, for certain problems, can lead to substantial savings not only 
in CPU time but also in memory requirements. 

Hybrid mode 

For linear semidefinite programming problems, we use the following hybrid 
approach, whenever the number of variables n is large compared to the size of 
the matrix constraint m: We try to solve the linear systems using the iterative 
approach as long as the iterative solver needs a moderate number of iterations. 
In our current implementation the maximal number of CG iterations allowed 
is 100. Each time the maximal number of steps is reached, we solve the system 
again by the Cholesky method. Once the system is solved by the Cholesky 
method, we use the Cholesky factor as a preconditioner for the iterative solver 
in the next system. As soon as the iterative solver fails three times in sequel, 
we completely switch to the Cholesky method. 

The hybrid mode allows us to reach a high precision solution while keep¬ 
ing the solution time low. The main reason is that, when using the iterative 
approach, the Hessian of the Augmented Lagrangian has not to be calculated 
explicitly. 

User interfaces 

The user has a choice of several interfaces to Pensdp. 

SDPA interface 

The problem data are written in an ASCII input file in a SDPA sparse format, 
as introduced in [9]. The code needs an additional ASCII input file with 
parameter values. 



PENNON: Software for linear and nonlinear matrix inequalities 


17 


C/C++/FORTRAN interface 

Pensdp can also be called as a function (or subroutine) from a C, CH—h 
or Fortran program. In this case, the user should link the Pensdp library 
to his/her program. In the program the user then has to specify problem 
dimensions, code parameters and the problem data (vectors and matrices) in 
a sparse format. 

Matlab interface 

In Matlab, Pensdp is called with the following arguments: 

[f,x,u,iflag,niter,feas] = pensdpm(pen); 

where pen a Matlab structure array with fields describing the problem di¬ 
mensions and problem data, again in a sparse format. 

Yalmip interface 

The most comfortable way of preparing the data and calling Pensdp is via 
Yalmip [22]. Yalmip is a modelling language for advanced modeling and 
solution of convex and nonconvex optimization problems. It is implemented 
as a free toolbox for MATLAB. When calling Pensdp from Yalmip , the user 
does not has to bother with the sparsity pattern of the problem—any linear 
optimization problem with vector or matrix variables will be translated by 
Yalmip into Pensdp data structure. 

3.2 Numerical experiments 

It is not our goal to compare Pensdp with other linear SDP solvers. This is 
done elsewhere in this book and the reader can also consult the benchmark 
page of Hans Mittelmann^, containing contemporary results. We will thus 
present only results for selected problems and will concentrate on the effect of 
special features available in Pensdp. The results for the ‘mater’ and ‘roselS’ 
problems were obtained on an Intel Core i7 processor 2.67GHz with 4GB 
memory. 

Sparsity: ‘mater’ problems 

Let us consider the ‘mater*’ problems from Mittelmann’s collection^. These 
problems are significant by several different sparsity patterns of the problem 
data. The problem has many small matrix constraints, the data matrices 
are sparse, only very few variables are involved in each constraint and the 
resulting Hessian matrix is sparse. We cannot switch off sparsity handling in 

® plato.la.asu.edu/bench.html 
plato.asu.edu/ftp/sparse_sdp.html 
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routines for Hessian assembling but we can run the code with (forced use of) 
dense Cholesky factorization and with sparse Cholesky routine. For instance, 
problem ‘materS’ with 1439 variables and 328 matrix constraints of size 11 was 
solved in 32 seconds using the dense Cholesky and only 4 seconds using the 
sparse Cholesky routine. The difference is, of course, more dramatic for larger 
problems. The next problem ‘mater4’ has 4807 variables and 1138 matrix 
constraints of size 11. While the sparse version of Pensdp only needed 20 
seconds to solve it, the dense version needed 1149 second. And while the 
largest problem ‘materb’ (20463 variables and 4968 matrix constraints) does 
not even fit in the 4GB memory for the dense version, the sparse code needs 
only 100MB and solves the problem in 134 seconds. 

Iterative solver: ‘TOH’ collection 

The effect of the use of preconditioned conjugate gradient method for the 
solution of the Newton system was described in detail in [18, 19]. Recall that 
iterative solvers are suitable for problems with a large number of variable 
and relatively small constraint matrices. We select from [18, 19] two examples 
arising from maximum clique problems on randomly generated graphs (the 
’TOH’ collection in [18]). The first example is ‘theta62’ with 13390 variables 
and matrix size 300. This problem could still be solved using the direct (dense 
Cholesky) solver and the code needed 13714 seconds to solve it. Compared to 
that, the iterative version of the code only needed 40 seconds to obtain the 
solution with the same precision. The average number of CG steps in each 
Newton system was only 10. The largest problem solved in the paper was 
‘thetal62’ with 127600 variables and a matrix constraint of size 800. Note 
that the Hessians of this example is dense, so to solve the problem by the 
direct version of Pensdp (or by any other interior-point algorithm) would 
mean to store and factorize a full matrix of dimension 127600 by 127600. 
On the other hand, the iterative version of Pensdp, being effectively a first- 
order code, has only modest memory requirements and allowed us to solve 
this problem in only 672 seconds. 

Hybrid mode: ‘rosel3’ 

To illustrate the advantages of the hybrid mode, we consider the problem 
‘rosel3’ from Mittelmann’s collection®. The problem has 2379 variables and 
one matrix constraint of size 105. When we solve the problem by Pensdp with 
a direct solver of the Newton system, the code needs 17 global iterations, 112 
Newton steps and the solution is obtained in 188 seconds CPU time, 152 
seconds of which is spent in the Cholesky factorization routine. 

Let us now solve the problem using the iterative solver for the Newton 
systems. Below we see the first and the last iterations of Pensdp. The required 
precision of DIMACS criteria is ^dimacs = 10“®. 

® plato.asu.edu/ftp/sparse_sdp.html 
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* 

it 1 

obj 

1 

opt 1 

Nwt 

1 

CG * 


1 

01 

O.OOOOe+000 

1 

O.OOOOe+000 1 

0 

1 

0 1 

1 

11 

1.8893e+003 

1 

8.3896e+000 I 

10 

1 

321 1 

1 

21 

2.2529e+002 

1 

8.2785e+000 I 

17 

1 

1244 1 

1 

91 

-1.1941e+001 

1 

2.2966e+000 I 

36 

1 

9712 1 

1 

101 

-1.1952e+001 

1 

4.9578e+000 I 

46 

1 

10209 1 

1 

151 

-1.1999e+001 

1 

5.0429e-002 I 

119 

1 

103905 

1 

161 

-1.1999e+001 

1 

4.4050e-003 I 

134 

1 

167186 




The table shows the global iterations of Algorithm 1, the value of the objective 
function and the gradient of the augmented Lagrangian and, in the last two 
columns, the cumulative number of Newton and CG steps. The code needed a 
large number of CG steps that was growing with increasing conditioning of the 
Newton system. The problem was solved in 732 seconds of CPU time. When 
we try to solve the same problem with a higher precision (Jdimacs = 10”^), 
the iterative method, and consequently the whole algorithm, will get into 
increasing difficulties. Below we see the last two iterations of Pensdp before 
it was stopped due to one-hour time limit. 

I 281 -1.2000e+001 I 5.2644e-002 I 373 I 700549 I 

I 291 -1.2000e+001 I 3.0833e-003 I 398 I 811921 I 

We can see that the optimality criterium is actually oscillating around 10“^. 

We now switch the hybrid mode on. The difference will be seen already 
in the early iterations of Pensdp. Running the problem with (^dimacs = 10“^, 
we get the following output 

* it I obj I opt I Nwt I CG * 

I 01 O.OOOOe+000 I O.OOOOe+000 I 0 I 0 I 

I II 1.8893e+003 I 8.3896e+000 I 10 I 321 I 

I 21 2.3285e+002 I 1.9814e+000 I 18 I 848 I 

I 91 -1.1971e+001 I 4.5469e-001 I 36 I 1660 I 

I 101 -1.1931e+001 I 7.4920e-002 I 63 I 2940 I 

I 131 -1.1998e+001 I 4.1400e-005 I 104 I 5073 I 

I 141 -1.1999e+001 I 5.9165e-004 I 115 I 5518 I 
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The CPU time needed was only 157 seconds, 130 of which were spent in 
the Cholesky factorization routine. When we now increase the precision to 
(5 dimacs = 10“^, the Pensdp with the hybrid mode will only need a few more 
iterations to reach it: 


I 161 -1.2000e+001 I 9.8736e-009 I 142 I 6623 I 

I 171 -1.2000e+001 I 5.9130e-007 I 156 I 7294 I 

The total CPU time increased to 201 seconds, 176 of which were spent in 
Cholesky factorization. 

Notice that the difference between the direct solver and the hybrid method 
would be even more significant for larger problems, such as ‘rosel5’. 


4 PENBMI 

We solve the SDP problem with quadratic objective function and linear and 
bilinear matrix inequality constraints: 

min ^x^Qx + f'^x (24) 

xSB" 2 

subject to 

n 

'^blxk<c\ i = l,...,W 

k^l 

n n n 

^0 + X! + X! X! XkXiKlf^ ^0, , 

k=l k=l 1=1 


where all data matrices are from S'". 

4.1 The code PENBMI 
User interface 

The advantage of our formulation of the BMI problem is that, although non¬ 
linear, the data only consist of matrices and vectors, just like in the linear SDP 
case. The user does not have to provide the (first and second) derivatives of 
the matrix functions, as these are readily available. Hence the user interface 
of Penbmi is a direct extension of the interface to Pensdp described in the 
previous section. 

In particular, the user has the choice of calling Penbmi from Matlab or 
from a C/C-l—f/Fortran code. In both cases, the user has to specify the ma¬ 
trices A\,k = 1,.. .n, and A,£ = 1,...n, for all constraints i = 1,...,iV, 
matrix Q from the objective function and vectors f,c,b^,i = 1,... ,N. As in 
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the linear SDP case, all matrices and vectors are assumed to be sparse (or 
even void), so the user has to provide the sparsity pattern of the constraints 
(which matrices are present) and sparsity structure of each matrix. 

Again, the most comfortable way of preparing the data and calling 
Penbmi is via Yalmip. In this case, the user does not has to stick to the 
formulation (24) and bother with the sparsity pattern of the problem—any 
optimization problem with vector or matrix variables and linear or quadratic 
objective function and (matrix) constraints will be translated by Yalmip into 
formulation (24) and the corresponding user interface will be automatically 
created. Below is a simple example of Yalmip code for the LQ optimal feed¬ 
back problem formulated as 


min trace(P) 

PeB2x2_ j:fgRlX2 

s.t. {A + BKfP + P{A + BK)^-l 2 x 2 -K^K 

F ^ 0 

with 

Using 
lines 

» A = [-1 2;-3 -4] ; B = [1;1] ; 

>> P = sdpvar(2,2); K = sdpvar(l,2); 

» F = [P >= 0; (A+B*K)’*P+P*(A+B*K) <= -eye(2)-K’*K]; 
>> optimize(F,trace(P),sdpsettings(’solver’,’penbmi’)); 


A = 


-1 2 
-3 -4 


B = 


Yalmip, the problem is formulated and solved by the following few 


4.2 The Static Output Feedback Problem 

Many interesting problems in linear and nonlinear systems control cannot be 
formulated and solved as LSDP. BMI formulation of the control problems was 
made popular in the mid 1990s [12]; there were, however, no computational 
methods for solving non-convex BMIs, in contrast with convex LMIs for which 
powerful interior-point algorithms were available. 

The most fundamental of these problems is perhaps static output feedback 
(SOF) stabilization: given a triplet of matrices A,B,C of suitable dimensions, 
find a matrix F such that the eigenvalues of matrix A + BFC are all in a 
given region of the complex plane, say the open left half-plane [3]. 

No LSDP formulation is known for this problem but a straightforward 
application of Lyapunov’s stability theory leads to a BMI formulation: matrix 
A -F BFC has all its eigenvalues in the open left half-plane if and only if there 
exists a matrix X such that 


{A + BFC^X + {A + BFC)X ^0, A = ^ 0 
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where ^ 0 and 0 stand for positive and negative definite, respectively. 

We present a short description of the benchmark collection COMPlJb : 
the constrained Matrix-optimization Problem library [20]®. COMPlJb can 
be used as a benchmark collection for a very wide variety of algorithms solving 
matrix optimization problems. Currently COMPlJb consists of 124 examples 
collected from the engineering literature and real-life applications for LTI 
control systems of the form 

x{t) = Ax{t) + Biw{t) + Bu{t), 

z\t) = Cix{t) -I- Diiw{t) + Di 2 u(t), (25) 

y{t) = Cx{t) + D 2 iw{t), 


where x G R”®, u G IR”“, y G R””, 0 G w G R”” denote the state, control 
input, measured output, regulated output, and noise input, respectively. 

The heart of COMPlJb is the MATLAB function file COMPleib.m. This 
function returns the data matrices A, Bi, B, Ci, C, Du, and D 21 of (25) 
of each individual COMPlJb example. Depending on specific control design 
goals, it is possible to derive particular matrix optimization problems using 
the data matrices provided by COMPlJb. A non exhaustive list of matrix 
optimization problems arising in feedback control design are stated in [20]. 
Many more control problems leading to NSDPs, BMIs or SDPs can be found 
in the literature. 

Here we state the BMI formulation of two basic static output feedback 
control design problems: SOF-'H 2 and SOF-'Hoo- The goal is to determine 
the matrix F G R"“^"t/ of the SOF control law u{t) = Fy{t) such that the 
closed loop system 

x{t) = A{F)x{t) + B{F)w{t), , . 

z{t) = C(F)x(t) + D{F)wit), ^ > 

fulfills some specific control design requirements, where A{F) = A + BFC, 
B{F) = BiF BFD 21 , C{F) =Ci+ D 12 FC, D{F) = Du + D 12 FD 21 . 

We begin with the SOF-'H 2 problem: Suppose that Du = 0 and D 21 = 0. 
Find a SOF gain F such that A{F) is Hurwitz and the 'H 2 -norm of (26) is 
minimal. This problem can be rewritten to the following 'H 2 “BMI problem 
formulation, see, e.g. [20]: 


min Tr{X) s.t. Q 0, 

[A + BFC)Q F Q{A F BFCf F BiBj F 0, 


A (Cl + Di2FC)Q 

g(Ci + DuFCf Q 


^ 0 , 


(27) 


where Q G R"»x"», A G R”-x«._ 


See http: //www.mathematik. uni-trier. de/~leibfritz/ 
Proj_TestSet/NSDPTestSet.htm 
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"Hoo synthesis is an attractive model-based control design tool and it al¬ 
lows incorporation of model uncertainties in the control design. The optimal 
SOF-'Hoo problem can be formally stated in the following term: Find a SOF 
matrix F such that A{F) is Hurwitz and the T-Loo-norm of (26) is minimal. 
We consider the following well known "Hoo-BMI version, see, e.g. [20]: 

min 7 s.t. X >- 0, 7 > 0, 

' A{FYX + XA{F) XB{F) C{F f 
B{FYX -7 D{FY 
C{F) D{F) -7/„, 


^ 0 , 


(28) 


where 7 S K, G 

We present results of our numerical experiences for the static output feed¬ 
back problems of COMPlJb . The link between COMPlJb and PENBMI was 
provided by the MATLAB parser YALMIP 3 [22]. All tests were performed on 
a 2.5 GHz Pentium with 1 GB RDRAM under Linux. The results of PENBMI 
for 'H 2 -BMI and "Hoo-BMI problems can be divided into seven groups: The 
first group consists of examples solved without any difficulties (38 problems 
in the "^2 case and 37 problems for the 'Hoc setting). The second and third 
group contain all cases for which we had to relax our stopping criterion. In 
4 (II) examples the achieved precision was still close to our predefined stop¬ 
ping criterion, while in 5 (7) cases deviation is signihcant (referring to 'H 2 
i'Hoo))- Then there are examples, for which we could calculate almost feasible 
solutions, but which failed to satisfy the Hurwitz-criterion, namely AC5 and 
NNIO. The fourth and fifth group consist of medium and small scale cases 
for which PENBMI failed, due to ill conditioned Hessian of F —the Gholesky 
algorithm used for its factorization did not deliver accurate solution and the 
Newton method failed. In the 'H 2 -setting (fourth group) these are AC7, AC9, 
AGI3, AC18, JEl, JE2, JE3, REA4, DIS5, WEGI, WEG2, WEG3, UWV, 
PAS, NNl, NN3, NN5, NN 6 , NN7, NN9, NN12 and NN17, in the "Hoo-setting 
(fifth group) JEl, JE2, JE3, REA4, DIS5, UWV, PAS, TF3, NNl, NN3, NN5, 
NN 6 , NN7 and NN13. The cases in the sixth group are large scale, ill condi¬ 
tioned problems, where PENBMI ran out of time (AGIO, AC14, CSE2, EB5). 
Finally, for very large test cases our code runs out of memory (HSl, BDT2, 
EB 6 , TL, GDP, NN18). 

4.3 Simultaneous Stabilization BMIs 

Another example leading to BMI formulation is the problem of simultane¬ 
ously stabilizing a family of single-input single-output linear systems by one 
fixed controller of given order. This problem arises for instance when trying 
to preserve stability of a control system under the failure of sensors, actua¬ 
tors, or processors. Simultaneous stabilization of three or more systems was 
extensively studied in [2]. Later on, the problem was shown to belong to the 
wide range of robust control problems that are NP-hard [3]. 
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In [14] a BMI formulation of the simultaneous stabilization problem was 
obtained in the framework of the polynomial, or algebraic approach to systems 
control. This formulation leads to a feasibility BMI problem which, in a more 
general setting can be reformulated by the following procedure: Assume we 
want to hnd a feasible point of the following system of BMIs 

n n n 

+ + i = l,...,iV (29) 

k=l k=l £=1 

with symmetric matrices A\,K1^ G k,£ = 1 ,... ,n, i = 1,... ,N, and 

X G R". Then we can check the feasibility of (29) by solving the following 
optimization problem 

min A (30) 

seR^.AeR 

n n n 

s.t. + XkXtKlt, 4 XI„, i = (31) 

fc=i fc=i t=\ 

Problem (30) is a global optimization problem: we know that if its global 
minimum A is non-negative then the original problem (29) is infeasible. On 
the other hand Penbmi can only hnd critical points, so when solving (30), 
the only conclusion we can make is the following: 

when A < 0, the system is strictly feasible; 
when A = 0, the system is marginally feasible; 
when A > 0 the system may be infeasible. 

During numerical experiments it turned out that the feasible region of (29) 
is often unbounded. We used two strategies to avoid numerical difficulties in 
this case: First we introduced large enough artihcial bounds Abound- Second, 
we modify the objective function by adding the square of the 2-norm of the 
vector X multiplied by a weighting parameter w. After these modihcations 
problem (30) reads as follows: 

min A-t-iujlccjlo (32) 

seR^.AeR 

s.t. 3:'bound “Si X ^ :rbound, ^ — 1, . . . , TZ 

n n n 

XkXi^Kj^^ XInxrn ^ * 7 ^ • 

k=l k=l £=1 

This is exactly the problem formulation we used in our numerical experiments. 

Results of numerical examples for a suite of simultaneous stabilization 
problems selected from the recent literature can be found in [13]. 
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5 PENNON 


5.1 The problem and the modified algorithm 
Problem formulation 


In this, so far most general version of the code, we solve optimization prob¬ 
lems with a nonlinear objective subject to nonlinear inequality and equality 
constraints and semidefinite bound constraints: 


min f(x,Y) 

xm",Yi&SPi,...,Yk&SPk 


subject to gi{x,Y)<0, 

i = l,. 

..,mg 

II 

o 

i = l,. 

. .,mh 

A ^ A ■) 

i = l,. 

.. ,fc. 


(33) 


Here 

• X G K" is the vector variable 

• Yi G ,Yk G are the matrix variables; we denote Y = (Yi,... ,Yk) 

• f,gi and hi are functions from M” x x ... x to K 

• Aj and Ai are the lower and upper bounds, respectively, on the eigenvalues 
of Yi, i = 1 ,..., fc 

Although the semidefinite inequality constraints are of a simple type, most 
nonlinear SDP problems can be formulated in the above form. For instance, 
the problem (1) can be transformed into (33) using slack variables and equality 
constraints, when 

A{x) =4 0 

is replaced by 


A{x) = S element-wise 
S' ^ 0 

with a new matrix variable S G S™. 

Direct equality handling 

Problem (33) is not actually a problem of type (1) that was introduced in 
the first section and for which we have developed the convergence theory. The 
new element here are the equality constraints. Of course, we can formulate 
the equalities as two inequalities, and this works surprisingly well for many 
problems. However, to treat the equalities in a “proper” way, we adopted a 
concept which is successfully used in modern primal-dual interior point algo¬ 
rithms (see, e.g., [32]): rather than using augmented Lagrangians, we handle 
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the equality constraints directly on the level of the subproblem. This leads to 
the following approach. Consider the optimization problem 


min f(x) 

£CGR" ^ ^ 

subject to 

A{x) =4 0, 
h{x) = 0, 


(34) 


where / and A are defined as in the previous sections and h : R” —>■ R.'^ repre¬ 
sents a set of equality constraints. Then we define the augmented Lagrangian 


F{x,U,v,p) = 

f{x) + {U,^p{A{x)))sr>^+v^h{x), (35) 


where are defined as before and v G is the vector of Lagrangian 

multipliers associated with the equality constraints. Now, on the level of the 
subproblem, we attempt to find an approximate solution of the following sys¬ 
tem (in X and v): 


V^F{x,U,v,p) = 0, 
h{x) = 0, 


(36) 


where the penalty parameter p as well as the multiplier U are fixed. In order 
to solve systems of type (36), we apply the damped Newton method. De¬ 
scent directions are calculated utilizing the factorization routine MA27 from 
the Harwell subroutine library ([6]) in combination with an inertia correction 
strategy as described in [32]. Moreover, the step length is derived using an 
augmented Lagrangian merit function defined as 


F{x,U,v,p) + ^ll^(a;)ll 


2 

2 


along with an Armijo rule. 


Strictly feasible constraints 

In certain applications, the bound constraints must remain strictly feasible for 
all iterations because, for instance, the objective function may be undefined 
at infeasible points [17]. To be able to solve such problems, we treat these 
inequalities by a classic barrier function. For this reason we introduce an 
additional matrix inequality 

S{x) =4 0 

in problem (2) and define the augmented Lagrangian 

F{x,U,p,s) = f{x) + {U,<Pp{A{x)))sr^ -h s^bar(5(a;)), 


(37) 
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where <?bar can be defined, for example, by 

<Pbar{S{x)) = - logdet(-5(x)). 

Note that, while the penalty parameter p maybe constant from a certain index 
k (see again [31] for details), the barrier parameter s is required to tend to 
zero with increasing k. 

5.2 The code PENNON 
Slack removal 

As already mentioned, to transform constraints of the type A{x) 0 into 
our standard structure, we need to introduce a slack matrix variable A, and 
replace the original constraint by A{x) = S element-wise, and S ^ 0. Thus in 
order to formulate the problem in the required form, we have to introduce a 
new (possibly large) matrix variable and many new equality constraints, which 
may have a negative effect on the performance of the algorithm. However, the 
reformulation using slack variables is only needed for the input of the problem, 
not for its solution by Algorithm 1. Hence, the user has the option to say that 
certain matrix variables are actually slacks and these are then automatically 
removed by a preprocessor. The code then solves the problem with the original 
constraint A{x) ^ 0. 


User interface 


Unlike in the Pensdp and Penbmi case, the user has to provide not only 
function values but also the first and second derivatives of the objective and 
constraint functions. In the Matlab and C/C-|--|-/Fortran interface the user 
is required to provide six functions/subroutines for evaluation of function 
value, gradient and Hessian of the objective function and the constraints, 
respectively, at a given point. 

To make things simple, the matrix variables are treated as vectors in these 
functions, using the operator svec : S’” —>■ defined by 


svec 


/ On 012 
022 

\sym 


Olm 

^2m 


— (oil, 012, 022, ■ • • , Oim, 02r: 


In the main program, the user defines problem sizes, values of bounds, and 
information about matrix variables (number, sizes and sparsity patterns). 

In addition, we also provide an interface to Ampl [7] which is a comfortable 
modelling language for optimization problems. As Ampl does not support 
matrix variables, we treat them, within an Ampl script, as vectors, using the 
operator svec defined above. 




The code needs to identify the matrix variables, their number and size. 
These data are included in an ASCII file that is directly read by the Pen¬ 
non and that, in addition, includes information about lower and upper bounds 
on these variables. 


5.3 Examples 

Most examples of nonlinear semidefinite programs that can be found in the 
literature are of the form: for a given (symmetric, indefinite) matrix H find a 
nearest positive semidefinite matrix satisfying possibly some additional con¬ 
straints. Many of these problems can be written as follows 

mmh\X-H\\l (38) 

xeS" 2 

subject to 

{Ai,X) = bi, i = 1,. . .,m 
XhO 

with Ai G S", i = Probably the most prominent example is the 

problem of finding the nearest correlation matrix [15]. 

Several algorithms have been derived for the solution of this problems; see, 
e.g., [15, 23]. It is not our primal goal to compete with these specialized algo¬ 
rithms (although Pennon can solve problems of type (38) rather efficiently). 
Rather we want to utilize the full potential of our code and solve “truly non¬ 
linear” semidefinite problems. In the rest of this section we will give examples 
of such problems. 
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5.4 Correlation matrix with the constrained condition number 

We consider the problem of finding the nearest correlation matrix: 

n 

^ (Xij - (39) 

subject to 
^ii — I5 

X ^ 0 

We will consider an example based on a practical application from finances; 
see [33]. Assume that a 5 x 5 correlation matrix is extended by one row and 
column. The new data is based on a different frequency than the original part 
of the matrix, which means that the new matrix is no longer positive definite: 

/ 1 -0.44 -0.20 0.81 -0.46 -0.05\ 

-0.44 1 0.87 -0.38 0.81 -0.58 

_ -0.20 .87 1 -0.17 0.65 -0.56 

“ 0.81 -0.38 -0.17 1 -0.37 -0.15 ' 

-0.46 0.81 0.65 -0.37 1 -0.08 

\-0.05 -0.58 -0.56 -0.15 0.08 1 / 

Let us find the nearest correlation matrix to iJext by solving (39) (either by 
Pennon or by any of the specialized algorithms mentioned at the beginning 
of this section). We obtain the following result (for the presentation of results, 
we will use Matlab output in short precision): 


X = 


1.0000 

-0.4420 

-0.2000 

0.8096 

-0.4585 

-0.0513 

-0.4420 

1.0000 

0.8704 

-0.3714 

0.7798 

-0.5549 

-0.2000 

0.8704 

1.0000 

-0.1699 

0.6497 

-0.5597 

0.8096 

-0.3714 

-0.1699 

1.0000 

-0.3766 

-0.1445 

-0.4585 

0.7798 

0.6497 

-0.3766 

1.0000 

0.0608 

-0.0513 

-0.5549 

-0.5597 

-0.1445 

0.0608 

1.0000 

with eigenvalues 






eigen = 

0.0000 

0.1163 

0.2120 

0.7827 

1.7132 

3.1757 


As we can see, one eigenvalue of the nearest correlation matrix is zero. This 
is highly undesirable from the application point of view. To avoid this, we 
can add lower (and upper) bounds on the matrix variable, i.e., constraints 
A/ ^ A ^ XI. However, the application requires a different approach when 
we need to bound the condition number of the nearest correlation matrix, i.e., 
to add the constraint 
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cond(X) < K. 

This constraint can be introduced in several ways. For instance, we can intro¬ 
duce the constraint ^ 

I <kI 

using the transformation X = C,X. The problem of finding the nearest corre¬ 
lation matrix with a given condition number then reads as follows: 

" 1 ~ 

(40) 

subject to 

Xj^'l ^ — 0 , % — 

I <X<kI . 

The new problem now has the NLP-SDP structure of (33). When solving it 
by Pennon with k = 10, we get the solution after 11 outer and 37 inner 
iterations. The optimal value of C, is 3.4886 and, after the back substitution 
X = ^77, we get the nearest correlation matrix 


X = 


1.0000 

-0.3775 

-0.2230 

0.7098 

-0.4272 

-0.0704 

-0.3775 

1.0000 

0.6930 

-0.3155 

0.5998 

-0.4218 

-0.2230 

0.6930 

1.0000 

-0.1546 

0.5523 

-0.4914 

0.7098 

-0.3155 

-0.1546 

1.0000 

-0.3857 

-0.1294 

-0.4272 

0.5998 

0.5523 

-0.3857 

1.0000 

-0.0576 

-0.0704 

-0.4218 

-0.4914 

-0.1294 

-0.0576 

1.0000 

with eigenvalues 






eigenvals = 
0.2866 

0.2866 

0.2867 

0.6717 

1.6019 

2.8664 


and the condition number equal to 10, indeed. 


Large-scale problems 

To test the capability of the code to solve large-scale problems, we have gen¬ 
erated randomly perturbed correlation matrices H of arbitrary dimension by 
the commands 

n = 500; x=10.' [-4:4/(n-1):0]; 

G = gallery(’randcorr’,n*x/sum(x)); 

E = 2*rand(n,n)-ones(n,n); E=triu(E)+triu(E,1)’; E=(E+E’)/2; 

H = (1-0.1).*G + 0.1*E; 
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For large-scale problems, we successfully use the iterative (preconditioned con¬ 
jugate gradient) solver for the Newton system in Step 1 of the algorithm. In 
every Newton step, the iterative solver needs just a few iterations, making it 
a very efficient alternative to a direct solver. 

For instance, to solve a problem with a 500 x 500 matrix H we needed 11 
outer and 148 inner iterations, 962 CG steps, and 21 minutes on a notebook. 
Note that the problem had 125251 variables and 500 linear constraints: that 
means that at each Newton step we solved (approximately) a system with a 
full 125251 X 125251 matrix. The iterative solver (needing just matrix-vector 
product) was clearly the only alternative here. 

We have also successfully solved a real-world problem with a matrix of 
dimension 2000 and with many additional linear constraints in about 10 hours 
on a standard Linux workstation with 4 Intel Core 2 Quad processors with 
2.83 GHz and 8 Gbyte of memory (using only one processor). 

5.5 Approximation by nonnegative splines 

Consider the problem of approximating a one-dimensional function given only 
by a large amount of noisy measurement by a cubic spline. Additionally, we 
require that the function is nonnegative. This kind of problem arises in many 
application, for instance, in shape optimization considering unilateral contact 
or in arrival rate approximation [1]. 

Assume that function / : R —>■ ffi. is defined on interval [0,1]. We are 
given its function values bi, i = 1,... ,n at points ti € (0,1). We may further 
assume that the function values are subject to a random noise. We want to 
find a smooth approximation of / by a cubic spline, i.e., by a function of the 
form 

P(t)=P«(t)=^P«(t-a._i)'= (41) 

k=0 

for a point t € [at-i, at], where 0 = oq < oi < ... < Um = 1 are the knots 
and P ^*^(i = 1,... ,m, fc = 0,1, 2, 3) the coefficients of the spline. The spline 
property that P should be continuous and have continuous first and second 
derivatives is expressed by the following equalities for i = 1,..., m — 1: 

pb+l)_ ^ 0 

(42) 

- Pf^ - 2P«(a, - a,_i) - SP^'^a, - = 0 (43) 

2pb-ei) _ 2pb) _ 6FW(a, - a*_i) = 0 . (44) 

The function / will be approximated by P in the least square sense, so we 
want to minimize 

n 

j=i 
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subject to (42),(43),(44). 

Now, the original function / is assumed to be nonnegative and we also 
want the approximation P to have this property. A simple way to guarantee 
nonnegativity of a spline is to express is using B-splines and consider only non¬ 
negative i3-spline coefficients. However, it was shown by de Boor and Daniel 
[5] that this may lead to a poor approximation of /. In particular, they showed 
that while approximation of a nonnegative function by nonnegative splines of 
order k gives errors of order , approximation by a subclass of nonnegative 
splines of order k consisting of all those whose B-spline coefficients are non¬ 
negative may yield only errors of order /i^. In order to get the best possible 
approximation, we use a result by Nesterov [26] saying that from (41) 

is nonnegative if and only if there exist two symmetric matrices 

^(z) _ 

\yi Zi) ’ [vi Wi 

such that 

Pq '’ = (ai - a^-i)si 
pI'^ = Xi- Si + 2(ai - ai-i)vi 
P^'^ = 2yi - 2vi + (at - ai-i)wi 
= Zi — Wi 

S'W^O. 

Summarizing, we want to solve an NLP-SDP problem 

n 

min (50) 

py'eR 

fc=0,l,2,3 

subject to 

(42), (43), (44), i = I,...,m 
(45)-(49), i = I,...,m. 

More complicated (“more nonlinear”) objective functions can be obtained 
when considering, for instance, the problem of approximating the arrival rate 
function of a non-homogeneous Poisson process based on observed arrival data 
[!]■ 

Example 2. A problem of approximating a cosine function given at 500 points 
by noisy data of the form cos (4*pi*rand(500,1) )+l+. 5 . *rand(500,1)- . 25 
approximated by a nonnegative cubic spline with 7 knots lead to an NSDP 
problem in 80 variables, 16 matrix variables, 16 matrix constraints, and 49 
linear inequality constraints. The problem was solved by Pennon in about I 
second using 17 global and 93 Newton iterations. 


(45) 

(46) 

(47) 

(48) 

(49) 
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